GitHub

Introduction

This Explorer allows to perform various statistical analyses and data mining operations in a very easy and intuitive way. As the name implies, this software aims at exploring data and getting quick insight of the order of magnitude of the observed objects. That's why it does focus on graphical representation and mouse driven operations, unlike the traditional statistical tools cluttered with numerous dialog boxes and lists with five decimal figures. You can, however, have the detailed numbers once your analysis is completed.

Videos

Overview
Contingency table
Weather data
Animation

Screenshots

Installation and run
Build from Source
Data loading
Main window
Graph
Tools
Selection
Conversions
Units
Types of analyses
In the browser
Credits
Contact

Installation and run

The Explorer is written in javascript and built with electron,

OSX

Download the latest version for darwin from the release page.

Windows

Download the latest version corresponding to your system (32bit or 64bit) from the release page. The application is bundled into a single exe file, thanks to BoxedApp Packer .

Linux

Follow the "Build from source" instructions below.

Build from Source

Should you want to go the Build & Deploy route -you'll require node.js (developed on v6.1.0, confirmed to work on v4.7.3) and npm (comes with node.js, developed using v3.9.5, confirmed to work on v2.15.11).

Download and unzip the Source files (zip or tar.gz) from the the release page, or clone the repository:

git clone https://github.com/jfbouzereau/explorer.git

Enter the Explorer's directory with cd explorer-1.x/app (if you downloaded it from Releases) or cd explorer/app (if you cloned the repository).

Install the dependencies:

npm install

And launch the app:

npm start

Data loading

At launch time, the Explorer shows a window to choose the dataset to use. You can either drag and drop a file from your computer desktop, or click the clipboard button.

Various file formats are accepted :

Source	File extension	Remarks
Access	mdb , accdb	Access 2000 or higher
ARFF / KEEL	*	No comments at the beginning of the file. The first line must be @relation
BigQuery	*	A config file with a content like this: BigQuery client_secret:/full/path/to/my_private_key.json query:select * from lookerdata:cdc.project_tycho_reports limit 1000 timeout:60000
dBase	dbf
Excel	xlsx	The names of the fields are expected at the top of the columns
JMP	jmp
JSON file	*	A JSON array of records
LIMDEP / NLOGIT	lpj
MINITAB	mtw
MLwiN	ws	Uncompressed format only
MongoDB	*	A config file with a content like this: mongodb host:192.168.0.121:27017 database:geo collection:countries query:{cont:{$eq:"EU"},pop:{$gt:50000000}}
Mysql	*	A config file with a content like this: mysql host:192.168.0.2 user:bob password:secret database:test query:select * from mytable
Postgres	*	A config file with a content like this: postgres host:192.168.0.2 user:bob password:secret database:test query:select * from mytable or: postgres connection:bob:secret@192.168.0.2/test query:select * from mytable
R	rdb	Binary format only
SAS	sas7bdat	Uncompressed format only
SPLUS	sdd
SPSS	sav	Uncompressed format only
SQL Server	*	A config file with a content like this: mssql host:192.168.0.121 username:bob password:secret query:select * from mytable
Stata	dta	Stata 8 or higher
Tabular file	*	The names of the fields are expected on the first line
Bzip2 file	bz2	The uncompressed file must be in one of the previous formats
Gzip file	gz	The uncompressed file must be in one of the previous formats
Web file	*	Contains the url of the data. The remote file must be in one of the previous formats

If you click the clipboard button, the data must be in tabular form, with the name of the fields on the first line.

Main window

Once the data have been successfully loaded, the main window is displayed :

Here are the elements of the interface :

List of the categorical fields (aka "the pink zone"). By default only 10 fields are displayed. To resize the list, move the mouse just below the list and drag to shrink or extend the list. To scroll the list, move the mouse to the right of the list.
Icons of the existing analyses (graphs). To run a new analysis, just drag its icon to the workspace.
List of the numerical fields (aka "the blue zone"). By default only 10 fields are displayed. To resize the list, move the mouse just below the list and drag to shrink or extend the list. To scroll the list, move the mouse to the right of the list.
Icons of the tools
Status bar. This area gives at any time details about the object under the mouse, or the action your are about to do.
Dock This area is used to keep graphs that are temporarily removed from the workspace.
Version number
Memory usage
Workspace. This area is where the graphs are created and arranged.

Graph

To create a new graph, drag its icon to the workspace. Alternatively if you dont know which icon to look at, you can right-click or control-click on the workspace to get a menu with all the possible analyses.

A graph is represented by an area with different noticeable parts :

Close box. Click on this box to close the graph. All the computations done will be lost.
Option menu. Some graphs have different ways of representing the results. In that case click on this sign to bring up the menu to choose from. Alternatively, right-click or control-click within the graph.
Title bar. This area shows the current selection (see below). Click on this area to drag the graph around.
Slots. These are the places where you can define the parameters of the analysis. Depending on the graph, different combinations of slots are shown. On a pink slot you can drag a categorical field. On a blue slot you can drag a numerical slot. Parameters can be swapped by dragging from one slot to another one ( of the same graph, and of the same color ).
Resize box. Click on this box and drag to resize the graph.

To change the type of a graph, drag the icon of the new type onto the graph. The new analysis will retain the parameters and selection of the previous one.

Selection

Every analysis can be restricted to a part of the data only. The set of observations (records) currently processed by a graph is named the selection, and is displayed in the title bar . Initially, the selection consists of all the observations, and the title is blank.

Selection based on a categorical field

Use a type of graph that allows to split the dataset into the desired groups : pie chart, bar chart, treemap.
Drag the slice of the group to be processed out of the graph, onto the workspace.
This creates a new pie chart with a selection equal to the slice's category.
Drag the icon of the wanted analysis onto this second graph. It will change its type, but will retain the selection. The type of graph can be changed as many times as wished, all the analyses will be conducted on the same selection.

Conversely, the selection of an existing graph can be changed by dragging a pie slice onto its title. This allows to conduct successively the same analysis on different parts of the data.

Selection based on a numerical field

Drag a numerical field from the blue zone to the title of an existing graph. The selection will consists of all the observations with a non-null value of the field. Typically a dummy variable (with values 0 or 1) would be used for this, but not necessarily.

Combining selections

Dragging a slice to the title of a graph which already has a selection will combine the two sets.

If the two variables are the same, the resulting selection will be the union of the two sets. Example: a pie graph splits the data into Apples, Pears, Peaches, and Apricots. If you drag the apple slice to the title of another graph, the selection will be Apples. If you then drag the peach slice to the title of the graph, the selection will be Apples + Peaches

If the two variables are not the same, the resulting selection will be the intersection of the two sets. Example : a pie graph splits the data into Apples, Pears, Peaches and Apricots. If you drag the apple slice to the title of another graph, the selection will be Apples. If you change the variable defining the pie to split the data into Organic and Non-Organic, and drag the Organic slice to the title of the second graph, the selection will be Apples AND Organic.

Conversions

When loading the data, the Explorer identifies fields containing only numbers as numeric, and all others fields as categorical. Sometimes it is desirable to change this. Several possibilities exist.

Drag a numerical field to the pink zone. The field is converted to categorical, the values are the same but as strings of characters.
Drag a categorical field to the blue zone. Each category gives a dummy variable of the same name, Therefore, there are as many dummies as categories of the initial field, and all the dummies are exclusive. Example : COLOR is the categorical field converted:

Original data:

ID	COLOR
1	Blue
2	Red
3	Green
4	Red

Data after the conversion

ID	Blue	Red	Green
1	1	0	0
2	0	1	0
3	0	0	1
4	0	1	0

Drag the special numerical field "1" to the pink zone. This "pivots" the data. Each numerical field becomes a category of a new PIVOT field, whose value is in a new COUNT field. Each original record gives as many records as the number of numerical fields. Example: HEIGHT, WIDTH and DEPTH are the numerical fields.

Original data :

ID	COLOR	HEIGHT	WIDTH	DEPTH
1	Blue	142	25	11
2	Red	175	12	16
3	Green	109	48	14

Data after the pivot :

ID	COLOR	PIVOT	COUNT
1	Blue	HEIGHT	142
1	Blue	WIDTH	25
1	Blue	DEPTH	11
2	Red	HEIGHT	175
2	Red	WIDTH	12
2	Red	DEPTH	16
3	Green	HEIGHT	109
3	Green	WIDTH	48
3	Green	DEPTH	14

Units

All the analyses applied to categorical fields (whose icon is pink) count the observations. For example in a pie chart the slices are proportional to the number of observations of each category. Sometimes the counts have to be weighted. This is done by changing the "unit" of the graph, by dragging a numerical field onto the graph. The title of the graph is turned blue to indicate that the counts are weighted. The status bar also shows the values or percentages in the new unit. To remove the unit and go back to the normal counting, drag the special field "1" onto the graph.
All the analyses that represents datapoints in a 2D plane ( scatter plot, PCA, discriminant analysis, ternary plot, etc) can also be modified. If a numerical field is set as unit, the datapoints are displayed as circles whose size is proportional to the unit :

Tools

Here are the various tools proposed by the toolbar at the bottom of the screen :

Sort : drag this icon onto a field, or drag a field onto this icon to sort the data in ascending order. Do the same sort again to sort in descending order. The sort is stable : to sort the data by a key consisting of field1,field2,field3, you must sort by field3 first, then field2, and finally field1.
Clone. Drag this icon onto a graph to get a copy of it, with the same parameters. If the computation is slow, this allows to bypass the second computation.
Add : Drag this icon to the pink or blue zone to create a new field. See below.
Help. Drag this icon onto a graph to get some informations about the analysis, the results produced, the representation options, and the possible actions.
Picture : Drag this icon onto a graph to get its image in png format.
Table : Drag this icon to the pink or blue zone to get a table of the values of the dataset. Drag this icon onto a graph to get a table of the numerical results. They can be copied to the clipboard ( with control-C or command-C ) and pasted into another software.
Dustbin : Drag this icon onto a field, or drag a field onto this icon to permanently remove the field ( if the field is used by some graphs, it cannot be removed ). Drag a pie slice, a bar, or a tree map slice onto this icon to permanently remove the corresponding records. The original input file is not modified.

Types of analysis

Pie chart
Bar chart
Line chart
Association diagram
Word cloud
Arc diagram
Contingency table
Multiple Correspondence analysis
3-variable graph
Treemap
Chi-2 tests
- Pearson' chi-square test
- Yates' chi-square test
- G-test
- Fisher's exact test
Gini impurity
Entropy
Repartition curve
Distribution curve
Scatter plot
Ternary plot
Andrew's curves
Survey plot
3D plot
Correlations
Autocorrelation plot
Probability plot
Tukey-lambda PPCC plot
Lag plot
General statistics
Normality tests
- Shapiro-Wilk test
- Anderson-Darling test
- Lilliefors test
- D'Agostino test
- Anscombe test
- Omnibus test
- Jarque-Bera test
Analysis of variance
- Bartlett's test
- F-test
- Levene test
- Brown Forsythe test
- Box's M test
- Student's T-test
- Welch T-test
- Hotelling's test
- Wilk's lambda
- Lawley-Hotelling trace
- Pillai trace
- Two-way anova
Non-parametric tests
- Kolmogorov-Smirnov test
- Kruskal-Wallis test
- Jonckheere test
- Cochran Q test
- Durbin test
- Friedman test
- Mantel-Haenszel test
- Breslow-Day test
- Woolf test
Principal components
Canonical correlation analysis
K-means
K-medoids
Fuzzy C-means
Huen diagram
Dendogram
Radviz
Discriminant analysis
Regressions
- Linear regression
- Poisson regression
- Negative binomial regression
- Logistic regression
- Least angle regression
Influence plot
QQ plot
Box plot
Parallel coordinates
Neural network (perceptron)

In the browser

The Explorer can also be executed in any modern browser. Open app/index.html, paste the data from the clipboard, and click OK.

Credits

The Explorer takes advantage of some very useful npm modules :

gapitoken Node.js module for Google API service account authorization
mongodb The official MongoDB driver for Node.js
pg Pure javascript PostgreSQL client for node.ja
lzma-purejs pure JavaScript LZMA de/compression, for node.js
mysql A node.js driver for mysql
request Simplified HTTP request client
synaptic Architecture-free neural network library for node.js and the browser
tedious A TDS driver, for connecting to MS SQLServer databases

Contact

jfbouzereau@netcourrier.com

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
EXAMPLES		EXAMPLES
app		app
screenshots		screenshots
.gitignore		.gitignore
License		License
README.md		README.md

License

jfbouzereau/explorer

Folders and files

Latest commit

History

Repository files navigation